62 research outputs found

    A Short Overview of Executing Γ Chemical Reactions over the ΣC and τC Dataflow Programming Models

    Get PDF
    International audienceMany-core processors offer top computational power while keeping the energy consumption reasonable compared to complex processors. Today, they enter both high-performance computing systems, as well as embedded systems. However, these processors require dedicated programming models to efficiently benefit from their massively parallel architectures. The chemical programming paradigm has been introduced in the late eighties as an elegant way of formally describing distributed programs. Data are seen as molecules that can freely react thanks to operators to create new data. This paradigm has also been used within the context of grid computing and now seems to be relevant for many-core processors. Very few implementations of runtimes for chemical programming have been proposed, none of them giving serious elements on how it can be deployed onto a real architecture. In this paper, we propose to implement some parts of the chemical paradigm over the ΣC dataflow programming language, that is dedicated to many-core processors. We show how to represent molecules using agents and communication links, and to iteratively build the dataflow graph following the chemical reactions. A preliminary implementation of the chemical reaction mechanisms is provided using the τ C dataflow compilation toolchain, a language close to ΣC, in order to demonstrate the relevance of the proposition

    Modèles et protocoles de cohérence des données en environnement volatil

    Get PDF
    Ce rapport s'intéresse au problème de la visualisation des données partagées dans les applications à base de couplage de codes sur les grilles. Nous proposons d'améliorer l'efficacité de la visualisation en intervenant sur les mécanismes de gestion des données répliquées et plus particulièrement au niveau du protocole de cohérence. La notion de lecture relâchée est alors introduite comme une extension du modèle de cohérence à l'entrée (entry consistency). Ce nouveau type d'opération peut être réalisé sans prise de verrou, en parallèle avec des écritures. En revanche, l'utilisateur relâche les contraintes sur la fraîcheur de la donnée et accepte de lire des versions légèrement anciennes, dont le retard est néanmoins contrôlé. L'implémentation de cette approche au sein du service de partage de données pour grilles JuxMem montre des gains considérables par rapport à une implémentation classique basée sur des lectures avec prise de verrou

    Throughput constrained parallelism reduction in cyclo-static dataflow applications

    Get PDF
    International audienceThis paper deals with semantics-preserving parallelism reduction methods for cyclo-static dataflow applications. Parallelism reduction is the process of equivalent actors fusioning. The principal objectives of parallelism reduction are to decrease the memory footprint of an application and to increase its execution performance. We focus on parallelism reduction methodologies constrained by application throughput. A generic parallelism reduction methodology is introduced. Experimental results are provided for asserting the performance of the proposed method

    Introducing a Data Sliding Mechanism for Cooperative Caching in Manycore Architectures

    Get PDF
    International audienceIn this paper, we propose a new cooperative caching method improving the cache miss rate for manycore micro- architec- tures. The work is motivated by some limitations of recent adaptive cooperative caching proposals. Elastic Cooperative caching (ECC), is a dynamic memory partitioning mechanism that allows sharing cache across cooperative nodes according to the application behavior. However, it is mainly limited with cache eviction rate in case of highly stressed neighbor- hood. Another system, the adaptive Set-Granular Cooperative Caching (ASCC), is based on finer set-based mechanisms for a better adaptability. However, heavy localized cache loads are not efficiently managed. In such a context, we propose a cooperative caching strategy that consists in sliding data through closer neighbors. When a cache receives a storing request of a neighbor's private block, it spills the least recently used private data to a close neighbor. Thus, solicited saturated nodes slide local blocks to their respective neighbors to always provide free cache space. We also propose a new Priority- based Data Replacement policy to decide efficiently which blocks should be spilled, and a new mechanism to choose host destination called Best Neighbor selector. The first analytic performance evaluation shows that the proposed cache management policies reduce by half the average global communication rate. As frequent accesses are focused in the neighboring zones, it efficiently improves on-Chip traffic. Finally, our evaluation shows that cache miss rate is en- hanced: each tile keeps the most frequently accessed data 1- Hop close to it, instead of ejecting them Off-Chip. Proposed techniques notably reduce the cache miss rate in case of high solicitation of the cooperative zone, as it is shown in the performed experiments

    Using the Spring Physical Model to Extend a Cooperative Caching Protocol for Many-Core Processors

    Get PDF
    International audienceAs the number of embedded cores grows up, the off-chip memory wall becomes an overwhelming bottleneck. As a consequence, it is more and more prevalent to efficiently exploit on-chip data storage. In a previous work, we proposed a data sliding mechanism that allows to store data onto our closest neighborhood, even under heavy stress loads. However, each cache block is allowed to migrate only one time to a neighbor's cache (e.g. 1-Chance Forwarding). In this paper, we propose an extension of our mechanism in order to expand the cooperative caching area. Our work is based on an adaptive physical model, where each cache block is considered as a mass connected to a spring. This technique constrains data migration according to the spring constant and the difference of work-loads between cores. This adaptive data sliding approach leads to a balanced spread of data on the chip and therefore improves on-chip storage. On-chip data access has been evaluated using an analytical approach. Results show that the extended data sliding increases the global cache hit rate on the chip, especially in the context of juxtaposed hot spots

    Vers la classification darwinienne d'un processeur fossile

    Get PDF
    This paper has been awarded "Editor's Choice" of the RAFT 2008 proceedings.National audienceÉvolutionnistes et créationnistes s'affrontent sur tous les plans afin d'imposer à l'ensemble de la communauté leurs idées quant à la disparition d'anciennes espèces. Le domaine de la recherche en informatique et plus particulièrement de la paléoprocessologie est d'autant plus sensible à ce débat que l'extension des laboratoires sur les campus révèle la présence d'un grand nombre de fossiles encore non identifiés. Cet article, véritable étude de cas, présente une approche expérimentale protocolaire visant à la classification d'un processum sorórem fossilis non identifié

    Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures

    Get PDF
    International audienceThe dataflow programming model has shown to be a relevant approach to efficiently run mas-sively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major impor-tance when it comes to performance, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the devel-oper in several ways that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor

    A Fast Evaluation Approach of Data Consistency Protocols within a Compilation Toolchain

    Get PDF
    International audienceShared memory is a critical issue for large distributed systems. Despite several data consistency protocols have been proposed, the selection of the protocol that best suits to the application requirements and system constraints remains a challenge. The development of multi-consistency systems, where different protocols can be deployed during runtime, appears to be an interesting alternative. In order to explore the design space of the consistency protocols a fast and accurate method should be used. In this work we rely on a compilation toolchain that transparently handles data consistency decisions for a multi-protocol platform. We focus on the analytical evaluation of the consistency configuration that stands within the optimization loop. We propose to use a TLM NoC simulator to get feedback on expected network contentions. We evaluate the approach using five workloads and three different data consistency protocols. As a result, we are able to obtain a fast and accurate evaluation of the different consistency alternatives

    Experimentations With CoRDAGe, A Generic Service For Co-Deploying and Re-Deploying Applications On Grids

    Get PDF
    Computer grids are made of thousands of heterogeneous physical resources that belong to different administration domains. This makes the use of the grid very complex. In this paper, we focus on deploying distributed applications at a large scale. As the application requirements may often not be anticipated, dynamic re-deployment is needed; if various applications have to co-operate within a workflow, they should also be co-deployed in a consistent way. In a previous paper, we have described the CORDAGE deployment model and its architecture. It meets the three properties of transparency, versatility, and neutrality. We report in this paper on its application to a real co-deployment over the GRID'5000 experimental platform, using different configurations, including multiple clients, multiple applications and multiple grid sites

    CoRDAGe: towards transparent management of interactions between applications and ressources

    Get PDF
    International audienceNowadays large-scale, grid-aware applications are intended to run for days or even weeks over hundreds or thousands of nodes. This requires new, and often painful operations for the user in charge of deployment and monitoring. We claim that the applications should themselves manage their run in an autonomic way, by requesting new resources on-demand. In this paper, we introduce CoRDAGe, a third-party tool, standing between applications and lower-level grid management tools. It provides generic and application-specific facilities to dynamically expand and retract the deployment of a grid-aware application according to its actual needs. A prototype has been implemented and a preliminary testing has been conducted on the Grid'5000 testbed
    corecore